Named Entity Recognition for Arabic Social Media

نویسندگان

  • Ayah Zirikly
  • Mona T. Diab
چکیده

The majority of research on Arabic Named Entity Recognition (NER) addresses the the task for newswire genre, where the language used is Modern Standard Arabic (MSA), however, the need to study this task in social media is becoming more vital. Social media is characterized by the use of both MSA and Dialectal Arabic (DA), with often code switching between the two language varieties. Despite some common characteristics between MSA and DA, there are significant differences between which result in poor performance when MSA targeting systems are applied for NER in DA. Additionally, most NER systems rely primarily on gazetteers, which can be more challenging in a social media processing context due to an inherent low coverage. In this paper, we present a gazetteers-free NER system for Dialectal data that yields an F1 score of 72.68% which is an absolute improvement of ≈ 2 − 3% over a comparable state-ofthe-art gazetteer based DA-NER system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

تشخیص اسامی اشخاص با استفاده از تزریق کلمه‌های نامزد اسم در میدان‌های تصادفی شرطی برای زبان عربی

Named Entity Recognition and Extraction are very important tasks for discovering proper names including persons, locations, date, and time, inside electronic textual resources. Accurate named entity recognition system is an essential utility to resolve fundamental problems in question answering systems, summary extraction, information retrieval and extraction, machine translation, video interpr...

متن کامل

Character-Aware Neural Networks for Arabic Named Entity Recognition for Social Media

Named Entity Recognition (NER) is the task of classifying or labelling atomic elements in the text into categories such as Person, Location or Organisation. For Arabic language, recognizing named entities is a challenging task because of the complexity and the unique characteristics of this language. In addition, most of the previous work focuses on Modern Standard Arabic (MSA), however, recogn...

متن کامل

An Approach for Extracting and Disambiguating Arabic Persons' Names Using Clustered Dictionaries and Scored Patterns

Building a system to extract Arabic named entities is a complex task due to the ambiguity and structure of Arabic text. Previous approaches that have tackled the problem of Arabic named entity recognition relied heavily on Arabic parsers and taggers combined with a huge set of gazetteers and sometimes large training sets to solve the ambiguity problem. But while these approaches are applicable ...

متن کامل

A Novel Approach for Detecting Arabic Persons' Names using Limited Resources

Named entity recognition is an involved task and is one that usually requires the usage of numerous resources. Recognizing Arabic entities is an even more difficult task due to the inherent ambiguity of the Arabic language. Previous approaches that have tackled the problem of Arabic named entity recognition have used Arabic parsers and taggers combined with a huge set of gazetteers and sometime...

متن کامل

Named Entity Recognition of Persons' Names in Arabic Tweets

The rise in Arabic usage within various social media platforms, and notably in Twitter, has led to a growing interest in building Arabic Natural Language Processing (NLP) applications capable of dealing with informal colloquial Arabic, as it is the most commonly used form of Arabic in social media. The unique characteristics of the Arabic language make the extraction of Arabic named entities a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015